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system, leading to energy saving. However, the challenge is to obtain 
reliable operation data from the actual building. Therefore, a lab-scaled 
centralized chilled water air conditioning system was successfully developed 


in this paper. All necessary sensors were installed to generate reliable 


operation data for the data-driven FDD. Nevertheless, if a practical system is 
Keywords: considered, the number of sensors required would be extensive as it depends 
on the number of rooms in the building. Hence, parameters impact in the 
dataset were also investigated to identify critical parameters for fault 
classifications. The analysis results had identified four critical parameters for 
data-driven FDD: the rooms' temperature (Trcx), supplied chilled water 
temperature (Tcuws), supplied chilled water flow rate (Vcuws) and supplied 
cooled water temperature (Tcws). Results showed that the data-driven FDD 
successfully diagnosed all six conditions correctly with the proposed 
parameters for more than 92.3% accuracy; only 0.6-3.4% differed from the 
original dataset's accuracy. Therefore, the proposed parameters can reduce 
the number of sensors used for practical buildings, thus reducing installation 
costs without compromising the FDD accuracy. 
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1. INTRODUCTION 

Faults in air conditioning systems, especially soft faults, are hard to detect. Even a regularly 
maintained building may suffer from soft faults without realising it [1]. Therefore, fault detection and 
diagnosis (FDD) plays an important role in building energy savings. Successful FDD can save up to 40% of 
air conditioning energy consumption [2]. One of the FDD methods is model-based FDD, which relies on 
mathematical modelling to represent the system. The detailed physical modelling derived using the first 
principle method is the most accurate way to describe the air conditioning system as proposed in [3]-[5]. 
However, since the system itself is a complex and dynamic system, the development of mathematical 
modelling is complex and requires detailed information regarding the system and is challenging to derive [6]. 
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In contrast, simplified mathematical modelling using a lumped parameter approach developed in [7], [8] are 
simpler to derive. However, the number of available fault models of air conditioning systems is still limited [9]. 
One of the reasons was that most of the modellings are developed for a specific system. Thus, some 
adjustment needs to be made to use in other types of air conditioning system. 

Recently, researchers are exploring more on data-driven FDD due to its simple yet reliable method. This 
method has gained much interest among researchers in many areas, such as in air conditioning systems [10]-[14], 
power generation systems [15]-[19] and motor drive systems [20], [21]. The method is simple to develop 
because it only requires historical data to train and validate its operational data. Thus, it is easy to develop, 
but it requires fault-free training data to classify other faults. Otherwise, the classifier model would recognize 
faults as the standard operating performance. 

Current FDD trends for air conditioning systems only focus on individual component, such as the 
chiller as in [10], [14], [22], and air handling unit (AHU) as in [23]-[27]. However, no FDD research 
considers faults across the entire air-conditioning system even though all components are interconnected [9]. 
Thus, faults in one component may affect other components' parameters. Therefore, by combining faults 
across the entire system, the ability of the FDD system to diagnose with correct faults can be analysed. To fill 
up this gap, Chen [28] has proposed data-driven FDD using the Bayesian network (BN) for the whole 
building fault, including faults across chiller, AHU, and operation schedule. However, this research does not 
cover faults across the cooling tower, which is also one of the air conditioning system components. One of 
the limitations of his research is that some faults may not be identified under certain weather, operation, or 
internal load conditions. Indeed, it is one of the biggest challenges for data-driven FDD in the actual building. 

There are many challenges to obtaining reliable fault-free and faults operation data in the actual 
building. Firstly, the initial building operation data might differ from those applied later in the building's 
lifetime. Furthermore, the external factors, such as environment and usage patterns, may vary the results as 
in [28]. It is also a challenge to simulate faults in actual buildings as it may disturb the thermal comfort of the 
occupants. Therefore, in our previous studies in [13], [29], we developed a lab-scaled chilled water air 
conditioning system. The data was used to develop three machine learning models as in [13]: deep learning, 
support vector machine (SVM) and multi-layer perceptron (MLP) for data-driven FDD of the entire system 
faults. It covers the entire system faults, which are faults across the chiller, AHU, and the cooling tower. 
Results showed that all models were successfully identified all faults for more than 95%. 

Deep learning, SVM and MLP are among the most widely used for classification proses. For instance, 
deep learning was successfully proposed as FDD in Tennessee Eastman (TE) process as presented in [30]. 
Results show deep learning model outshines the other five classifier models. Likewise, Yan et al. [31] 
successfully proposed SVM as FDD in the chiller system. SVM also shows the highest accuracy compared to 
other methods in detecting breast cancer [32], [33]. Meanwhile, MLP successfully diagnosed bladder cancer 
and predicted faults in yacht hydrodynamics, as portrayed in [34], [35]. 

Even though the FDD in [13] successfully diagnosed the faults, it requires many sensors to be 
implemented in actual buildings. Nevertheless, most air conditioning systems in non-residential buildings 
have a limited number of sensors, and most of them were installed for control purposes only [6]. Hence, it 
needs a substantial additional cost to add more sensors to the building. Furthermore, the accuracy of the data- 
driven method depends on the parameter data collected from the system. The more parameters in the dataset, 
the better FDD accuracy will be produced, and the bigger the system is, the more parameters will be required. 
Therefore, it is essential to identify the impact of those parameters on their ability to detect faults. The 
unimportant parameters can be eliminated to reduce the installation cost without compromising FDD accuracy. 
Thus, the proposed parameters can still avoid unnecessary energy wastage with smaller installation costs. 

In this paper, the impact of each parameter in FDD was investigated to identify the critical 
parameters. New dataset combinations were developed based on standard deviation and accuracy percentage 
values. Each combination was then evaluated using deep learning, SVM and MLP model developed in [13]. 
The performance of the proposed critical parameters was then compared with the performance of the original 
dataset in [13]. This paper was written in four sections, where some research backgrounds are presented in 
section 1. Then, the research methodology is presented carefully in section 2. It includes the development of 
the lab-scaled system, the fault simulation on the system and the investigation of each parameter’s impact. 
Section 3 elaborates the outcome of this research in detail. Lastly, the conclusions are written up in section 4. 


2. RESEARCH METHOD 

This section explains the research methodology of this research. It involves the development of the 
lab-scaled system and the selection methods of the parameters. The lab-scaled system was developed to 
generate reliable data for the FDD. Whereas the values of standard deviation and accuracy were used to 
investigate the impact of those parameters generated by the system. 
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2.1. Lab scaled of centralised chilled water air conditioning system 

Figure 1 shows the lab-scaled system developed in this research as described in [7], [8], [13]. It 
consists of a chiller, cooling tower, AHU, and two rooms to replicate an actual centralised chilled water air 
conditioning system. The chiller used is a ready-made chiller system equipped with a chilled water tank, and 
the cooling tower is designed as a counter flow type. The AHU system has a cooling coil, a fan, supply and 
return ducts for each room, and dampers. The speed of the fan can be varied to achieve a specific supplied 
airflow rate. The rooms were constructed by insulated board and poly-carbonate, and each of them sizes 
2.4x1.2x1.6 m. Five bulbs rated 100 watt each was installed in each room to simulate heat from equipment 


and occupants. 
Room 1 = a i - Air Ducting 


Control 2 a || 2-2. 
board 


Room 2 


Cooling 
Tower 


Chiller 


Figure 1. The lab-scaled of the chilled water system 


The system is a set of standalone and self-contained equipment. It has a structured platform to 
accommodate the cooling tower, water-cooled chiller, and AHU system. Two rooms were installed next to 
the structured platform. Four lockable castor wheels were mounted at the bottom of the structure platform for 
easy mobilization. The size of the platform is 64 cm (W)x150 cm (L)x170 cm (H). A control board is used to 
control and operate the system. The system was equipped with fourteen sensors: thermocouple sensors, water 
flow rate sensors, airflow rate sensors, and current sensor, and the details of each sensor and the parameters 
measured were tabulated in Table 1. The system coefficient of performance (COP) was also analyzed and 
presented in Sulaiman et al. [13]. The results show that the COPs reduce when the system has faults, which is 
consistent with the results energy audit of the actual system presented in Othman et al. [1]. 


Table 1. List of the sensors in the lab-scaled system 
Sensor Type Parameters measured 
Temperature sensor Trc; = Air temperature in Room 1 
Trc = Air temperature in Room 2 
Ts; = Air temperature at ducting Room 1 
Tsz = Air temperature at ducting Room 2 
Tcuws = Supplied chilled water temperature 
Tcrwr = Returned chilled water temperature 
Tcws = Supplied cooled water temperature 
Tcwr = Returned cooled water temperature 
Airflow rate sensor Vs, = Airflow rate at ducting Room 1 
Vs2 = Airflow rate at ducting Room 2 
Water flow rate sensor Vrs = Supplied chilled water flow rate 
Vcuwr = Returned chilled water flow rate 
Vcws= Supplied cooled water flow rate 
Current sensor Ccu = Compressor current 


All parameters in Table 1 was logged during various conditions simulations in the lab-scaled 
system. The conditions simulated as described in Table 2, the location and type of faults were also portrayed 
in the table. It includes five faults throughout the entire system and one normal without fault condition. Three 
machine learning models were used to classify all conditions as described in Table 3. The parameter setting 
for each model is displayed in the table. All classifier models have successfully identified all conditions as 
presented in Sulaiman et al. [13]. 
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Table 2. List of conditions simulated in the lab- Table 3. Simulation parameters [13] 
scaled system [13] Models Parameter setting 
Condition Location of fault Type of fault Deep Activation function for hidden layer: sigmoid 
Normal (no-fault) = a learning Activation function for the output layer: 

Evaporator Clogging Chiller Soft softmax , , 

Compressor Failure Chiller Abrupt Optimization: stochastic gradient descent 

Cooling Tower Fan Faulty Cooling Tower Soft (SGD) ; 
Damper Stuck AHU Soft SVM Kernel function: polykernel 
Air Ducting Leakage AHU Soft MLP Activation function: sigmoid 


2.2. Parameter selection 

Out of fourteen sensors, six were installed in the two rooms, three sensors for each room. If a 
practical system is considered, the number of sensors required would be extensive as it depends on the 
number of rooms in the actual building. In other words, more cost is needed as three sensors are required for 
each room. Therefore, it is essential to investigate the impact of these parameters in classifying the faults. 
Insignificant sensors can be eliminated to reduce installation costs. However, the elimination must not affect 
accuracy. Table 4 represents the list of sensors and their location throughout the entire system. The data were 
categorised into two, Group A and Group B. Group A is a set of parameters related to the rooms, and Group 
B is a set of parameters associated with the central unit. 


Table 4. List of conditions simulated in the lab-scaled system 
Group Location of the sensors Parameters measured 

Group A (Sensors located at rooms) Room 1 Trci 
Tsı 
Vsı 
Room 2 Tro 
Tsz 
Vs2 

Group B (Sensors located at the central unit) The central unit of the system Tcuws 


T CHWR 
T CWS 
T CWR 
Vc HWS 

Vou WR 
Ve WS 


Ccu 


In general, the number of parameters can be presented as (1), 
Np = N4Nroom + Nz = 3Nroom + 8, (1) 


where np represents the number of total parameters, n4 is the number of parameters from Group A, Nroom is the 
total number of rooms, and ng is the number of parameters from Group B. In (1) indicates that the more rooms 
used in the system, the more parameters will increase. Therefore, it is essential to identify the critical parameters 
to detect all six conditions in FDD. Hence it can minimize the number of sensors used in a practical system and 
eventually reduce the cost. The values of standard deviation and accuracy were used to investigate the impact of 
these parameters to detect faults without compromising the performance of the classifiers. 


2.3. Standard deviation 

In statistics, a standard deviation is used as a measure of variation in the dataset. A low value of 
standard deviation represents the data is close to the mean value. In contrast, a high value indicates that the 
data has a broader range and is farther than its mean value. In this paper, the standard deviation can be used 
to identify which parameters have notably changed throughout the simulation. Thus, it can be used to analyse 
the impact of parameter selection in identifying the faults. Table 5 shows the standard deviation value of each 
parameter in the dataset. The subscript x in parameter Group A denotes the room number, where 
x= 1255 Nroom. 

Table 5 shows that Vs and Tcxws have the highest standard deviation value for each group. In 
contrast, Trc and Vcws have the lowest standard deviation values. It shows that parameters of Vs and Tcxws 
significantly changed during simulations compared to Trc and Vcws data. Therefore, the higher value of 
standard deviation may represent a more significant impact on the fault simulations. There is also a 
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possibility that the low value of standard deviation was less critical in fault classification and can be removed 
from the dataset. 

The parameters selection for new datasets of Group A and Group B are described in Table 6 and 
Table 7. One parameter was eliminated for every dataset formed in both Table 6 and Table 7. The datasets 
were formed based on the standard deviation values shown in Table 5. Datasets in Group A and Group B 
were then combined one by one as (2), 


Combination dataset = { A1B1; A1B2; ...; A2B1; A2B2; ...; AnBm}, (2) 


where n is the number of datasets in Group A, and m is the number of datasets in Group B. Each 
combination was tested and compared with all three machine learning classifiers in Table 3. 


Table 5. Standard deviation value for all parameters in the dataset 


Group Parameters Parameters measured 
Group A Vsx 10.03 
Tsx 5.2 
Trex 2.57 
Group B Tcuws 5.67 
Tcwr 5.31 
Tcuwr 4.03 
Vcuws 2.38 
VcuwR 1.83 
Cou 1.53 
Table 6. The selection of parameters in Group A Table 7. The selection of parameters in Group B 
Dataset List of parameters Dataset List of parameters 
Original Vsx, Tsx, Trex Original Tcuws, Tcwr, Tews, Tcuwr, Venws, Vcenwr, Ccu, Vews 
Al Vsx, Tsx B1 Tcuws, Tcwr, Tews, Tcuwr, Vcnws, Vcuwr, Ccu 
A2 Trex, Vsx B2 Tcuws, Tcwr, Tews, Tcuwr, Venws, Vcnwr 
A3 Trex, Tsx B3 Tcuws, Tcwr, Tews, Tcuwr, Vcuws 
A4 Vsx B4 Tcuws, Tcwr, Tews, Tcuwr 
AS Tsx B5 Tcuws, Tcwr, Tews 
A6 Trex B6 Tcuws, Tcwr 
B7 Tcuws 


2.4. Accuracy 

The accuracy of the deep learning classifier was analysed when one of the parameters was removed 
from the dataset. The results represent the ability of the classifier to identify and classify the faults. 
Therefore, the higher accuracy obtained when a parameter was taken out from the dataset represents that the 
parameter does not impact the fault classification. However, should the accuracy decrease much when the 
parameter was eliminated from the dataset, the parameter significantly impacts the fault classification. The 
results were presented in Table 8, while Table 9 shows the parameters selection for new datasets of Group B. 
The datasets of Group A remain unchanged, as in Table 4. Similarly, each dataset’s combination was tested 
and compared with three machine learning classifiers. 


Table 8. The accuracy of the classifier when each of Table 9. The selection of 
these parameters was deleted from the original dataset parameters in Group B 
Group Parameters deleted __ Parameters measured Dataset List of parameters 
Group A Vsx 91.4% Original Tcuws, Tewr, Tews, Tcuwr, Vcuws, Vcuwrs 
Trex 93.1% Ccu, Vews 
Tsx 94.0% Bll Tews, Tcuws, Vcuws, Tewr, Cou, Tcuwr, Vews 
Group B Tcws 91.5% B12 Tews, Tcuws, Veuws, Tewr, Con, Touwr 
Tcuws 93.7% B13 Tews, Tcuws, Vcnws, Tewr, Ccu 
Vcuws 93.8% B14 Tews, Tcuws, Vcuws, Tewr 
Towr 94.0% B15 Tews, Tcuws, Vcnws, Cen 
Ccu 94.0% B16 Tews, Tcnws, Vcuws 
Tcuwr 94.1% B17 Tcws, Tcnws 
Vews 94.3% B18 Tews 
Vcuwr 94.5% 
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3. RESULTS AND DISCUSSION 

Table 10 shows the results of the best combination datasets formed using both methods discussed in 
the previous section. The combination was selected for the least parameters with a minimum of 90% 
accuracy. For instance, dataset A1B5 combined dataset Al from Group A and dataset B5 from Group B. It 
was the best combination in dataset Al with a minimum number of parameters that reached 90% accuracy. 
Dataset A5 was not listed because all combinations with any datasets Group B produced below 90% 
accuracy. The number of parameters required for each dataset was developed as in (1). The first part of the 
equation represents the parameters from Group A, while the second part represents the parameters in Group 
B. Based on the equations, the number of sensors depends on the number of rooms in the system. The results 
show that datasets A4B3, A6B3, A4B16, and A6B16 required the least number of sensors when the number 
of rooms increased, as compared to others. 


Table 10. Results for the best combination datasets formed 


Method Dataset | Number of parameters required, np Number of sensors for 
Nroom=1 Nroom=2 _Nroom=3  Nroom=4 

Standard deviation A1BS5 2Nroom + 3 5 7 9 11 
A2B5 2Nroom + 3 5 | 9 11 

A3B4 2Nroom + 4 6 8 10 12 

A4B3 Nroom + 5 6 7 8 9 

A6B3 Neroom +5 6 7 8 9 

Accuracy A1B16 2Nroom + 3 5 7 9 11 
A2B17 2Nroom + 2 4 6 8 10 

A3B16 2Nroom + 3 5 F 9 11 

A4B16 Nroom + 3 4 5 6 7 

A6B16 Nroom + 3 4 5 6 7 


Based on the investigation results in Table 10, the datasets combination of Dataset A4 and A6 for 
standard deviation and accuracy selection methods were identified as the minimum number of required 
sensors. Table 11 compares the classification results from our previous study in [13] with the highlighted 
datasets in Table 10: A4B3, A6B3, A4B16, and A6B16. Three machine learning classifiers: deep learning, 
support vector machine (SVM), and multi-layer perceptron (MLP), were used to measure the accuracy of all 
five datasets. The accuracy of these newly combined datasets was a bit lesser than the original dataset in [13], 
around 0.6%-3.4%. Nonetheless, the differences were not much and are still reliable. 


Table 11. Comparison results between the original dataset, Dataset A4B3, A6B3, A4B16, and A6B16 


Original dataset [13] Dataset A4B3 Dataset A6B3 Dataset A4B16 Dataset A6B16 


Classification accuracy Deep learning 94% 93.2% 91.8% 93.4% 92.3% 
SVM 97% 94.6% 94.3% 94.3% 93.6% 
MLP 99.4% 97.5% 97.3% 97.4% 96.6% 
Parameters Group A Vsx Vsx Trex Vsx Trex 
Tsx 
Trex 
Group B Tcrnws Tcuws Tcuws Tcuws Tcuws 
Tcowr Vcnws Vcuws Vcnws Vcnws 
Tcws Tews Tews Tews Tews 
Tcuwr Tcuwr Tcuwr 
Vcuws Tcwr Tcwr 
VcuwR 
Cou 
Vews 


The original dataset has three parameters Group A and eight parameters of Group B. In comparison, 
Datasets A4B3 and A6B3 have one parameter of Group A and five parameters of Group B. Although Group 
A’s parameter is different, the parameters of Group B are the same for both datasets. Similarly, it is the same 
case for datasets A4B16 and A6B16, where Group B has the same parameters for both datasets. For 
information, datasets A4B16 and A6B16 have one parameter from Group A and three parameters of Group 
B. Moreover, the parameters of dataset B16 were part of the parameters of dataset B3. It can be concluded 
that Tcuws, Vcuws, and Tcws were among the critical parameters in Group B to classify all six conditions. As 
for Group A, either Trex or Vsx can be regarded as equally crucial for data-driven FDD because both datasets 
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had almost similar accuracy. Mathematically, the list of proposed parameters for data-driven FDD for 
centralised chilled water air conditioning system can be written as (3), 


Parameter, P = { Proom, PcEnTRAL} (3) 


where Poom = {inex = 1 Ngoom} OR {Vsx|x = 1,2, ...., NRoom} 
and PerenrraL = (Tcuws» Veuws) Tcws}- 

The minimum number of parameters required to identify six conditions, as described in Table 10 
successfully, can be expressed as (4), 


Np = Nroom + 3, (4) 


where Nroom represents the total number of rooms. The constant 3 indicates the three critical Group B 
parameters, which are Tcuws, Vcuws, Tews. The other parameter associated with the number of rooms is 
either Vs or Trc. In this research, two thermocouples were used to measure Trc, the temperature of each 
room, while for the airflow sensor, model SD2001 from ifm electronic was used to measure Vs. The price of 
an airflow sensor is very much higher than the price of thermocouples. Therefore, in terms of cost, all four 
parameters in Dataset A6B16 can be considered the critical parameters to identify six classes of faults for this 
research at a lower cost than Dataset A4B16. Although the accuracy of the Dataset A4B16 was slightly 
higher than the Dataset A6B16, the difference was not much and was still above 90%. 


4. CONCLUSION 

This paper has presented the developed lab-scaled of a centralized chilled water air-conditioning 
system to represent the actual system. It is a complete system with a cooling tower, chiller, AHU and two 
rooms. Six conditions had successfully simulated in the lab-scaled system and presented in our previous 
study. However, if a practical system is considered, the number of sensors required would be extensive as it 
depends on the number of rooms in the building. In other words, more cost is needed as the number of 
sensors is increased with the number of rooms. Therefore, this paper has proposed critical parameters for 
data-driven FDD of a centralized chilled water system. The impact of each parameter was identified and 
carefully analyzed to maintain a good FDD accuracy. Four critical parameters were proposed in this paper: 
the rooms’ temperature, Trcx, supplied chilled water temperature, Tcuws, supplied chilled water flow rate, 
Vcuws, and supplied cooled water temperature, Tcws. Results showed that the data-driven FDD successfully 
diagnosed all six conditions with the proposed parameters for more than 92.3% accuracy. Furthermore, the 
results were only differed by 0.6-3.4%, which was almost similar to our previous study. With the proposed 
parameters, only critical parameters to be installed in the actual building thus can reduce the sensors 
installation cost. 
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